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Abstract 

We prove several results concerning classifications, based on successive observa- 
tions {Xi, . . . ,Xn) of an unknown stationary and ergodic process, for membership 
in a given class of processes, such as the class of all finite order Markov chains. 
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1 Introduction and Statement of Results 

If ^ is a subclass of all stationary and ergodic binary processes then a sequence of functions 
gn '■ {0,1}" ^ {YES, NO} is a classification for Q in probability if 

lim P{g4Xi,...,Xn)=YES) = l 

n—KyD 

for all processes in Q, and 

lim P{g4Xi,...,Xn)=N0) = l 

n— >oo 

for all processes not in Q. 



Similarly, : {0, 1}" {YES, NO} is a classification for ^ in a pointwise sense if 

gni^i, ■ ■ ■ , — YES eventually almost surely 

for all processes in Q, and 

gni^i, ■ ■ ■ , ^n) = NO eventually almost surely 

for all processes not in Q. Of course, if is a classification in a pointwise sense then 
it is a classification in probability but a classification in probability is not necessarily a 
classification in a pointwise sense. 

For the class A^fc of A;-step mixing Markov chains of fixed order k, there is a pointwise 
classification of the type we have just described. (For mixing Markov chains see Propo- 
sition 1.2.10 in Shields (1996).) It was carried out in detail for independent processes by 
Bailey (1976). (Actually he proved the result only for independent processes and indi- 
cated how to generalize his result for the class of M.k-) For the class Mmix = Ufclo-^^ 
of mixing Markov chains of any order. Bailey showed that no such classification exists. 
See Ornstein and Weiss (1990) for some further results on this kind of question. Our 
concern in this paper is with the class of finitarily Markovian processes which is defined 
as follows. 

Let {Xn}'^=i be a stationary and ergodic binary time series. A one sided stationary time 
series can always be thought to be a two sided time series For 

m<nletX« = (X^,...,X„). 

Definition: A stationary and ergodic binary time series {X„} is said to be finitarily 
Markovian if for almost every xzlo there is a finite K{xZlo) such that for alH > and 
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yZ] if P(Xo = l\XZ^Zl = yZl Xzk = xZ'k) > then 

P(Xo = 1\XZI, = xZ'k) = P{Xo = \\Xzlzl = VZ\. XZ], = xZ\). 

This class includes all finite order Markov chains (mixing or not) and many other processes 
such as the finitarily deterministic processes of Kahkow, Katznelson and Weiss (1992). 

Example 1 First we define a Markov process which serves as the technical tool for our 
construction. Let the state space S be the non-negative integers. The transition probabil- 
ities are as follows: with probability one move from to 1 and from 1 to 2, for all s >2 
move with equal probability 0.5 to and s+ 1. This construction yields a stationary and 
ergodic Markov process {Mj} with stationary distribution 

P{Mi = 0) = P(M, = 1) = I 

and 

P{Mi^j)^^ for 3 > 2. 

Now we define the binary hidden Markov chain {X^}, which we denote as, X^ — f{Mi). 
Let f{0) — 0, /(I) = 0, and f{s) — 1 for all even states s. A feature of this definition of 
/(•) is that whenever Xn = 0,Xn+i = 0,Xn+2 = 1 we know that Mn = and vice versa. 
Consider the class of processes of the above form for all possible labeling of the rest of 
the states by zero and one. (It is easy to see that this class contains Markov chains of 
order < r + 1, e.g. when for all s > r f{s) = 1 and processes which are not Markov of 
any order, e.g. when f{T + 1) = for i = 1,2, . . . and for the rest of the yet unlabeled 
odd states s, f{s) — 1.) This class is a subclass of all stationary and ergodic binary 
finitarily Markovian processes. (Clearly, the conditional probability P{Xi — 1|X°^) does 
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not depend on values beyond the first (going backward) occurrence of 001.) Gydrfi, Morvai 
and Yakowitz (1998) proved that there is no estimator of the value P{Xn+i — ll-'^") from 
samples such that the error tends to zero as n tends to infinity in the pointwise sense 
for this class of processes. 

Example 2 Let {M„} be any stationary and ergodic first order Markov chain with finite 
or countably infinite state space S. Let s & S be an arbitrary state with P{Mi = s) > 0. 
Now letXn — I{Mn=s}- By Shields (1996), Chapter L2.C.1, the binary time series {Xn\ is 
stationary and ergodic. It is also finitarily Markovian. (Indeed, the conditional probability 
P{Xi = 1\X^^) does not depend on values beyond the first (going backwards) occurrence 
of one in X^^ which identifies the first (going backwards) occurrence of state s in the 
Markov chain {M„}. ) The resulting time series {Xn} is not a Markov chain of any 
order in general. (Indeed, consider the Markov chain {Af„} with state space S = {0, 1, 2} 
and transition probabilities P{X2 = l|Xi = 0) = P{X2 = 2\Xi = 1) = 1, P{X2 = 0\Xi = 
2) = P{X2 — l\Xi — 2) — 0.5. This yields a stationary and ergodic Markov chain {M„}, 
cf. Example 1.2.8 in Shields (1996). Clearly, the resulting time series X„ = Jj^^^^^o} will 
not be Markov of any order. The conditional probability P{Xi = 0\X^^) depends on 
whether until the first (going backwards) occurrence of one you see even or odd number of 
zeros.) These examples include all stationary and ergodic binary renewal processes with 
finite expected inter-arrival times, a basic class for many applications. (A stationary and 
ergodic binary renewal process is defined as a stationary and ergodic binary process such 
that the times between occurrences of ones are independent and identically distributed 
with finite expectation, cf. Chapter 1.2. c.l in Shields (1996). ) 

Our main result is that there is no classification for membership in the class of finitarily 
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Markovian processes. As a byproduct we will also improve Bailey's result from mix- 
ing Markov chains to the class of Markov chains. Our results apply to both pointwise 
classifications and classifications in probability. 



Theorem 1 Given a sequence of functions Qn '■ {0, 1}" — > {YES, NO} such that 

• for all stationary and ergodic binary Markov chains {X^} with arbitrary finite order 

lim P{gn{X^) = YES) = 1 (1) 

n— >oo 

• for all stationary and ergodic binary non finitarily Markovian processes 

lim P{gn{X^) = N0) = 1 (2) 

n— +00 

we construct a single stationary and ergodic binary process {Xn}such that 

limsup P{gn{X^) = YES) = 1 and lim sup P(5„(Xf) = iVO) = 1. 

n—KX> n— »oo 

Corollary 1 There is no classification for the class of all stationary and ergodic binary 
Markov chains with arbitrary finite order, in a pointwise sense or in probability. 



Remark 1 For motivation consider the universal intermittent estimation problem where 
the goal is to find stopping times Tk such that one can estimate P{Xt-^+i — ll^i*") from 
samples Xl'' in the pointwise sense for all stationary and ergodic binary time series. 
Such a universal scheme was proposed in Morvai (2003). Unfortunately the stopping 
times of Morvai (2003) grow very rapidly. Had one classified the Markov chains from 
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non Markov chains then one could have improved the scheme of Morvai such that it would 
have remained universially pointwise consistent for all stationary and ergodic processes 
and particularly, if the process turned out to be Markov, one could have estimated the 
conditional probability P{Xk+i = l|^i) eventually for all k that is, r„+i = r„ + 1 even- 
tually. Indeed, if gf„(X") classified the process as Markov then one could simply use a 
Markov order estimator ( e.g of Csiszdr and Shields (2000) ) and count frequencies of 
blocks with length equal to the order and this estimator is consistent in the pointwise sense 
for Markov chains. Otherwise one could use the universal estimator of Morvai (2003). 

Corollary 2 There is no classification for the class of all stationary and ergodic binary 
finitarily Markovian processes, in a pointwise sense or in probability. 

Remcirk 2 Concerning the above mentioned intermittent estimation problem, one could 
have improved the universal estimator of Morvai (2003) for finitarily Markovian pro- 
cesses. Had (7„(X") classified the process as a finitarily Markovian process one could use 
the stopping times and estimator e.g as in Morvai and Weiss (2003) which estimator 
is not universal but it works for all finitarily Markovian processes and the growth of the 
stopping times is much more moderate compared to the stopping times associated with the 
universal estimator in Morvai (2003). For non finitarily Markovian processes one could 
use the universal estimator of Morvai (2003). 

2 Proofs 

The following lemma is well known. 
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Lemma 1 Let {X„} be a stationary and ergodic binary time series and N a positive 
integer. Then there is a stationary and ergodic binary Markov chain {Zn} of some finite 
order < N such that the N dimensional distributions of {X„} and {Zn} are identical. 

Proof: Put P{Zn+i = z\^i = ^i) = P{.Xn+i = z\^i = ^i)- This yields a stationary 
and ergodic Markov chain {Zn} of some finite order < with the original marginal 
distribution P(Zf = xf ) = P(Xf = xf ), that is, for n > iV, define 

n 

P(zr = x^) = p(zf = xf ) n P{z, = z^zi:l, = xt]^. 

i=N+l 

Clearly {Z„,} is a stationary Markov chain of some finite order < since {Xn} was 
stationary. The chain {^n} can be thought of as one step Markov chain by passing to 
A^-tuples. The ergodicity of the {X^} process guarantees that this chain is irreducible 
when considered as a chain on those A^-tuples which have positive measure under the 
distribution of X^ . The process {Zn} is also ergodic since stationary binary irreducible 
Markov chains of some finite order are ergodic by Proposition 1.2.9 in Shields (1996). 
(Cf. also Kemeny and Snell (I960).) The proof of Lemma[T]is complete. 

Definition: The entropy rate H associated with a stationary binary time series {Xn} is 
defined as 

H = -E{P{X, = l\Xzl)\og,P{X, = 1|XI^) 
+ P(Xo = Q\XZI) log2 P(Xo = 0|X-^)} . 



Lemma 2 Given a stationary and ergodic binary process {Xn}, an integer N > and 
a real number < 5 < 1, there exists a stationary and ergodic non finitarily Markovian 
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process {Yn} such that 

Y: \P{X^ = y^) - PiY,"^ = y^)\ < 5. (3) 
<e{o,i}'^ 

Proof: Let {Z„} be a stationary and ergodic binary time series with zero entropy rate 
such that all finite words have positive probability. It is well known that such processes 
exist. For the sake of completeness we supply a proof in Lemma [3] in the Appendix. This 
process is clearly not finitarily Markovian. 

By ergodicity of the {Xn} process, there exists an r and a word wl such that the empirical 
counts of all blocks from w[ are 5/2^^^ close to the probabilities corresponding to the 
{Xn} process. 

We would like to define a process in which we alternate between the fixed word w\ and 
the ZnS, Zi,wl, Z2,wl, .... If we can do this and identify uniquely the position of the 
Zn's then this process will not be finitarily Markovian. In order to uniquely identify 
the positions of the Z„'s we will add a synchronizing word whose length is very 
small compared to the length of wl and which appears only where we place it. The 
fact that its length is small means that the finite distributions will remain close to the 
finite distribution of the {Xn} process. For to sychronize we need to know that when 
looking across a string like Zi,u'^,wl, Z2,u^,w'[, Z^, the word appears only in the two 
locations where it is written. 

Now choose some word with length m = [10 log2 r] such that this word does not 
appear in the word w[ and it has no reasonable non-trivial self overlap. More precisely, 
there is no non-trivial self overlap greater than 2 /5m and there is no overlap with w[ 
greater than 2 /5m. The number of words with length m which have greater self overlap 
is at most 2m2^/^'^. The number of words of length m which have overlap with w\ greater 

7 



than 2/5m but not completely contained in is at most 2m2'^/^™'. The number of words 
with length m completely contained in w[ is at most r. Summing up the number of these 
possible bad words we get 

r + 4m23/^'" < 2"". 

Thus there is at least one word u"^ with the desired property. The word will serve as 
a synchronyzing word. 

We will define the desired {Y„} process in two steps. First we will define a nonstationary 
process {VF^} as follows. Consider n — 1 = ri{m + r + 1) + 6, where < 6 < m + r 
and T] > 0. The process {Wn} will be obtained by inserting a fixed block u'^,wl of 
length m + r between successive symbols of the process Define the process {Wn} 

as follows. Let 



Zr^+i a 9 = 



ug if 1 < 9 < m 
Wg^m iim+l<9<m + r. 
Our assumptions on the synchronizing word imply that such a process will not be sta- 
tionary and to ensure stationarity we need to randomize over m + r + 1. Here is a formal 
description. Let ( be distributed on {0, . . . , m + r} uniformly. Let ( be independent from 
{VFn}- Define {Yn} as F„ = VF„+c- (That is, {Yn} is constracted from {Wn} by averaging 
over the m + r + 1 shifts of the {PKi} process. ) 

The fact that was synchronyzing means that C is a function of the {Yn} process. 
Thus from {Yn} one recovers exactly the {Zn} process. Now {Yn} is a stationary and 
ergodic binary non finitarily Markovian time series since {Zn} was such. To see that ([3]) 
is satisfied one uses the property of w[ and takes r sufficiently large so that the edge 
effects caused by u"^ are negligible. The proof of Lemma [2] is complete. 
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Proof of Theorem [1} To construct we will alternately use the two lemmas to 

construct a sequence of processes {vi*''}, which for odd i will be a Markov chain and for 
even i will not even be finitarily Markovian but the entire sequence will converge to an 
ergodic process which will have the required properties. Here is how this is done. 

Let < < 1 such that — > and < 6k < I such that YlT=i < 0.25. We construct 
our process as follows: Let {l^i^"*} be independent and identically distributed random 
variables assuming the values {0, 1} with equal probabilities. Let iVi > 1 be so large that 

P{g^,{Yl'\...y^}) = YES)>l-e, 

and there exists a set Wtvi ^ {0, 1}^^ such that P{{yI^\ . . . , Y^^) G WatJ > 1 — ei and 



max . 



E - 



(-^{«i+i=x} I{v,+i=x}) 



i=0 



< ei. 



Assume for A; = 2, . . . , i — 1 we have already defined a sequence of stationary and ergodic 
binary time series {Fi'^'*} and positive integers > k"^ and sets lA^^, C {0,1}^* such 
that P((Ff \ . . . , y}^^) eUN,)>l- efc, 

E = • • • ' = VN.-J-Piy}'' = 2/1, • • • , = yN.-.)\ < ^^-u 



max 



Nk-k + 1 



Nk-k 



i=0 



(4) 



and 



if k is even then {Fi*^''} is not finitarily Markovian and 



Pig^^{Yl'\...,Y;f>) = NO)>l-ek 
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if k is odd then {Y^^^} is a Markov chain with some order and 



P{g^Syl'\...,Y;,l>) = YES)>l-e,. 



Now we define it for i. If i is odd then apply Lemma [T] for with A^i_i. Let {yi*^} 

denote the resulting stationary and ergodic binary Markov chain. Now let Ni > ^ be so 
large that 

P{g^XY^\...,Y^^;) = YES)>\-e, 
and there is a set Un, C {0, 1}^' such that P((f/*\ . . . , Y^^^_) G WjvJ > 1 - and 



max 



J2 {hu'jXi=-{} ~ hv',ti=x{}) 

3=0 



< e,: 



By assumption ([T]) and the ergodicity of there exists such an A^j. 

If i is even then apply Lemma [2] for {Fi*^^''} with Ni^i and Let {Fi*^} denote the 

resulting non finitarily Markovian process. Now let Ni > P be so large that 



p{g^xyl'\---,y;v:) = NO)>i-e 



and there is a set Un, C {0, 1}^' such that P{{yI'\ F^/) e UnJ > I - ei and 

1 N^^i 



.. .. E 



max 

iV; JV,- 



j=0 



By assumption ([2]) and the ergodicity of {Yn^}^^^ there exists such an N^. 



Now it follows from the construction that for any n < Nk and k < K, 

CO 



i=k 
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which tends to zero as k ^ oo. Now define in the following way: For each n let 

P(Xr = x^) = hm PiY}''^ = xi, . . . , = Xn). 

Clearly is stationary since all {Yn''^} were stationary. Since P{{Xi, . . . , X^f.) € 

V(Nk) > 1 — Cfc — Si^fc'^i' ^ ® Ei'i^d Lemma m in the Appendix, {X„} is also 
ergodic. Now it follows from the construction that 

oo 

|P(Xr = X-) - P(rf ) = XI, . . . , ) = x„) I < J2 

i=fc 

Thus for k even, 

oo 

P(f7^,(Xi, . . . , X;vJ = iVO) > 1 - efc - ^ 5, 

i=k 

and the right hand side tends to 1 as /c — > oo. Similarly, when k is odd, 

oo 

P(^7^,(Xi, . . . , X;vJ = V-ES) > 1 - efc - 5^ 5, 
and the right hand side tends to 1 as /c — > oo. The proof of Theorem [1] is complete. 



3 Appendix 

We present now the proofs of two fairly standard lemmas that we used before. 

Lemma 3 There exists a stationary and ergodic time series {Zn} with zero entropy rate 
such that all finite words have positive probability. 

Proof: Let T : [0, 1] [0, 1] denote the mapping x —>■ x + a mod 1 where a is a 
fixed irrational. Denote the Lebesgue measure on [0, 1] by yU. For a measurable subset 
A of [0, 1] let ta{x) = min{?7, > 1 : T",t G A} denote the first return time to A. 
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Partition A into Aj^ = {x & A : Tj^{x) = k}. Note that : < i < k} are 

disjoint sets. We will define a particular set A with the property that for all k the sets A^ 
will have positive measure. Indeed, one can choose inductively points {xn} and 6n > 0, 
Ylm=n+i < 0.15„ Sufficiently small so that if /„ = [xn — Sn, Xn + 5n] the A defined as 
follows will have the required property: 



It is easy to see that for all k, fi{Ak) > 0. In this case we can list all binary words 
with finite length, {0, 1, 00, 01, ... } = {wi, W2, ■ ■ ■}, and denote by \wk\ the length of Wk- 
Define a partition of [0, 1] into two sets {P07-P1} by taking the k-th word Wk in the list 
and assigning the first \wk\ sets of (T^Ak), (T^Ak), . . . , (T'^^^Ak) to Pq ot Pi according 
to the symbols in Wk and then assign to Pq all remaining points in [0, 1]. Finally define 
a stationary and ergodic binary process as follows: Choose x uniformly on [0, 1] and set 



It is clear that all finite words have positive probability. Furthermore it is well known 
that any process defined by an irrational rotation as above is stationary and ergodic and 
has zero entropy cf. Cornfeld et al. (1982). The proof of Lemma [3] is complete. 

Lemma 4 A binary stationary time series {Xn} is ergodic if there is a sequence of 
positive integers Nk > k"^ tending to 00, > tending to zero and a sequence of sets 
^N^. ^ {0, l}^'' with probability greater than 1 — such that for all u^'' , Vi'' G l^Nk, 



00 



00 



m— 1 
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Proof: First observe that ([5]) implies that for all Ui'',v^'' G l^N^y ^^^d for all j < k, 

Nk-k 



Nk-k + l 



i=0 



(Indeed, 



1 



Nk-k + 1 



Nk-k 



5Z (^{"::i=-{} ~ ^{-^=4} 



i=0 



E 



1 



Nk-k + 1 



< 



E E 

£{0,1}'=- 



E 



E E 

a;J+iG{0,l}'=-^ 

5Z (^{«u^=-n ~ ^{-s?=-n 



1 



ATfe - A; + 1 



Nk-k 



i=0 



Nk-k + 1 



which is, by assumption, less than 



Now for any M < k and u^'' , v^*" G Un^ ) 



5Z (^K|^=-n ~ ^Kif 

j=0 



E 



G{0,1}« 



iVfc - M + 1 



Nk-M 



i=0 



^ E 

+ 



1 



N,-k + l 



j=0 



Nk-k + 1 
Nk-M + 1 



Nk-M + 1 



k- M ,f 
iVfc - M + 1 



where we used ([6]). Thus for any M < k and G Wat^,, 



E 



1 



Nk-M + 1 



i=0 



+ 1 —-^1 J 



<ek + 



k-M 



Nk-M + 1 
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Assume the process is stationary but not ergodic. Then for some M and for some 

af G {0, 

^ n— 1 

hm — > IfYi+M_^Mi 



j=0 



almost surely exists, but the limit is not a constant on any set of probability one. (Cf. 
Theorem 7.2.1 in Gray (1988).) This means that there exist 5 > and positive integer 
TiQ such that for all n > tiq there will be sets En, C {0, 1}" of probability > 105 such 



that for all G En and G 

n-M 



1 



n-M + 1 



i=0 



i+1 



> 10(5. 



For M and 6 above choose k large enough so that M < k, ek < 0.55, 2 {k — M)/{Nk — 
M + 1) < 0.55, and A^^^ > uq. (Such a exists since — and < — 0.) 
However this leads to a contradiction since Un,, fills all but 6 while on sets E^^ and F^^, 
which have probability at least 105, the empirical distributions differ. (Wat^. should have 
nonempty intersection with both E^^ and Fn,, and so on Un^ the emprical distribution 
should differ by 105 which contradicts (CD and the fact that ek + 2^{k-M)/{Nk-M+l) < 
5. ) The proof of Lemma H] is complete. 
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